4-Regression Discontinuity Design
Utrecht School of Economics
2025
Why?
For an RDD design we need:
There are two kinds of RDD:
There are two ways:
Directly model non-linearity: parametric approach
Only use observations close to the threshold: non-parametric approach
In this case we can add non-linear terms to the regression equation:
By adding polynomials: \[Y_i = \alpha + \beta_1 X_i + \beta_2 X_i^2 + \delta D_i + \varepsilon_i\]
By adding interactions with the running variable: \[Y_i = \alpha + \beta_1 (X_i-c_0) + \beta_2 D_i + \beta_3 (X_i-c_0)\times D_i + \varepsilon_i\]
A combination of the two: \[Y_i = \alpha + \beta_1 (X_i-c_0) + \beta_2 (X_i-c_0)^2 + \beta_3 D_i\] \[+ \beta_4 (X_i-c_0)\times D_i + \beta_5 (X_i-c_0)^2\times D_i + \varepsilon_i\]
Least squares approaches can have poor predictive power when the relationship is non-linear.
Alternatively, we can use a local linear regression:
The closest we are to the cutoff the less non-linearity matters.
We can estimate the treatment effect by fitting a linear regression to the observations close to the cutoff: \[Y_i = \alpha + \beta X_i + \delta D_i + \varepsilon_i\]
The bandwidth \(b\) is a key parameter in the RDD.
How to choose the bandwidth?
| Group | On Medicare | Any Insurance | Private Coverage | 2 + Forms Coverage | Managed Care |
|---|---|---|---|---|---|
| Overall Sample | 59.7 (4.1) | 9.5 (0.6) | -2.9 (1.1) | 44.1 (2.8) | -28.4 (2.1) |
| White non-Hispanic | |||||
| High School Dropout | 58.5 (4.6) | 13.0 (2.7) | -6.2 (3.3) | 44.5 (4.0) | -25.0 (4.5) |
| High School Graduate | 64.7 (5.0) | 7.6 (0.7) | -1.9 (1.6) | 51.8 (3.8) | -30.3 (2.6) |
| Some College | 68.4 (4.7) | 4.4 (0.5) | -2.3 (1.8) | 55.1 (4.0) | -40.1 (2.6) |
| Minority | |||||
| High School Dropout | 44.5 (3.1) | 21.5 (2.1) | -1.2 (2.5) | 19.4 (1.9) | -8.3 (3.1) |
| High School Graduate | 44.6 (4.7) | 8.9 (2.8) | -5.8 (5.1) | 23.4 (4.8) | -15.4 (3.5) |
| Some College | 52.1 (4.9) | 5.8 (2.0) | -5.4 (4.3) | 38.4 (3.8) | -22.3 (7.2) |
| Classified by Ethnicity Only | |||||
| White non-Hispanic | 65.2 (4.6) | 7.3 (0.5) | -2.8 (1.4) | 51.9 (3.5) | -33.6 (2.3) |
| Black non-Hispanic | 48.5 (3.6) | 11.9 (2.0) | -4.2 (2.8) | 27.8 (3.7) | -13.5 (3.7) |
| Hispanic | 44.4 (3.7) | 17.3 (3.0) | -2.0 (1.7) | 21.7 (2.1) | -12.1 (3.7) |
The table shows the average treatment for 65-year-olds compliers
Eligibility increases Medicare take-up by 59.7 percentage points in the overall sample.
Eligibility reduces private coverage by 2.9 percentage points in the overall sample.
Identification relies on continuity of the running variable around the cutoff: \[\lim_{{65 \leftarrow a}} E[y^1|a] - \lim_{{a \to 65}} E[y^0|a] = \delta\]
What else changes at 65?
In a follow up paper, Card and coauthors show that Medicare eligibility reduces mortality by around 1 percentage points.
The results are robust to different bandwidths and functional forms.
RDD estimates the effect of the treatment for individuals close to the cutoff.
In other words RDD gives you the LATE
Extrapolating results for individuals far from the threshold requires strong assumptions
RDD typically estimates only local effects!
We call a discontinuity fuzzy when the probability of treatment suddendly increases at the cutoff.
Formally:
\[ \lim_{{X \to c_0}} P(D_i = 1|X_i=c_0) \neq \lim_{{c_0 \leftarrow X_i}} P(D_i = 0|X_i=c_0)\]
The identifying assumptions remain the same as for the sharp RDD.
With fuzzy RDD we can use the threshold as an instrument for the treatment.
As for an IV, we can estimate the treatment effect using the following two-stage least squares (2SLS) regression:
We take a paper by Fetter, 20131 as an example.
Fetter estimates the effect of the GI Bill on home ownership.
The GI Bill was a policy that provided mortgage subsidies to veterans of World War II.
Discontinuity is in birth year: veterans of big wars (WWII and Korean War) are eligible for the GI Bill, if you are born too late you could have not joined for these wars.
But not everyone joins the military!
library(tidyverse); library(fixest); library(modelsummary)
vet <- causaldata::mortgages
# Create an "above-cutoff" variable as the instrument
vet <- vet %>% mutate(above = qob_minus_kw > 0)
# Impose a bandwidth of 12 quarters on either side
vet <- vet %>% filter(abs(qob_minus_kw) < 12)
m <- feols(home_ownership ~
nonwhite | # Control for race
bpl + qob | # fixed effect controls
qob_minus_kw*vet_wwko ~ # Instrument our standard RDD
qob_minus_kw*above, # with being above the cutoff
se = 'hetero', # heteroskedasticity-robust SEs
data = vet)
# And look at the results
msummary(m, stars = c('*' = .1, '**' = .05, '***' = .01))| (1) | |
|---|---|
| * p < 0.1, ** p < 0.05, *** p < 0.01 | |
| fit_qob_minus_kw | -0.007*** |
| (0.002) | |
| fit_vet_wwko | 0.170*** |
| (0.046) | |
| fit_qob_minus_kw × vet_wwko | -0.003 |
| (0.003) | |
| nonwhite | -0.190*** |
| (0.007) | |
| Num.Obs. | 56901 |
| R2 | 0.053 |
| R2 Adj. | 0.052 |
| R2 Within | 0.037 |
| R2 Within Adj. | 0.037 |
| AIC | 68659.5 |
| BIC | 69187.5 |
| RMSE | 0.44 |
| Std.Errors | Heteroskedasticity-robust |
| FE: bpl | X |
| FE: qob | X |